Newest 'class-imbalance+imbalanced-learn' Questions

0votes

1answer

40views

SVC labels entire sample majority class, even after using ADASYN

I have an imbalanced sample (850 in group X vs 100 in group Y). I am trying to predict group membership using support vector classifcation. I am using 'Adaptive Synthetic' (ADASYN) to oversample the ...

Vincent

103

asked Aug 20, 2024 at 16:36

4votes

0answers

69views

How do you know that your classifier is suffering from class imbalance?

Inspired by @Dave's question "Why does data science see class imbalance as a problem for supervised learning when statistics does not?", I am re-posting a question I posed on the stats SE to ...

Dikran Marsupial

590

asked May 30, 2024 at 10:59

6votes

3answers

296views

Reproducible examples where balancing the training data demonstrably improves accuracy

I asked this question on the Statistics SE, but there were no answers, even when a modest bonus was available, so I am asking here to see if any examples can be given. I have been looking into the ...

Dikran Marsupial

590

asked Apr 18, 2023 at 11:51

1vote

1answer

1kviews

I used SMOTE-ENN to balance my dataset and it improved the performance metrics, but how can I be sure it's not overfitting?

The models were evaluated using 10-fold cross validation. foldCount = StratifiedKFold(10, shuffle=True, random_state=1) The models in question are XGBoost. ...

Tariq

15

asked Mar 20, 2023 at 10:09

2votes

2answers

2kviews

How to calculate accuracy of an imbalanced dataset

I like to understand what is the accuracy of an imbalanced dataset. Let's suppose we have a medical dataset and we want to predict the disease among the patients. Say, in an existing dataset 95% of ...

Encipher

381

asked Sep 10, 2022 at 18:05

0votes

1answer

94views

Do I need to use AUPRC for reporting classification results on an imbalanced dataset when the model was trained using upsampling and CV

I am working on a binary classification problem which dataset has about 5% of positive class samples. I split the dataset, 70% for training and 30% for testing. I used the test data only once for ...

Paul

1

asked Aug 17, 2022 at 15:37

0votes

1answer

130views

How to effectively evaluate a model with highly imbalanced and limited dataset

Most data imbalance questions on this stack have been asking How to learn a better model, but I tend to think one other problem is How do we define "better" (i.e. fairly evaluate the learned ...

jasperhyp

23

asked Jul 16, 2022 at 15:22

1vote

1answer

431views

Class imbalance: Will transforming multi-label (aka multi-task) to multi-class problem help?

I noticed this and this questions, but my problem is more about class imbalance. So now I have, say, 1000 targets and some input samples (with some feature vectors). Each input sample can have label ...

jasperhyp

23

asked Feb 24, 2022 at 0:41

0votes

1answer

78views

Give more weight to features based on distribution plot

I have a task to predict a binary variable purchase, their dataset is strongly imbalanced (10:100) and the models I have tried so far (mostly ensemble) fail. In ...

robsanna

101

asked Feb 10, 2022 at 14:53

0votes

1answer

78views

Over-sampling when predicting a contionuous variable

Let's say I am predicting house selling prices (continuous) and therefore have multiple independent variables (numerical and categorical). Is it common practice to balance the dataset when the ...

Kev

9

asked Jan 20, 2022 at 17:07

0votes

1answer

249views

Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets

Reading the following article: https://kiwidamien.github.io/how-to-do-cross-validation-when-upsampling-data.html There is an explanation of how to use ...

PwNzDust

149

asked Jan 1, 2022 at 19:14

0votes

1answer

2kviews

Handling Imbalanced Datasets in Orange

I work in the medical domain, so class imbalance is the rule and not the exception. While I know Python has packages for class imbalance, I don't see an option in Orange for e.g. a SMOTE widget. I ...

Bob Hoyt

11

asked Feb 21, 2021 at 21:55

3votes

1answer

829views

What does IBA mean in imblearn classification report?

imblearn is a python library for handling imbalanced data. A code for generating classification report is given below. ...

codeczar

153

asked Jan 21, 2021 at 17:49

2votes

1answer

3kviews

Using SMOTENC in a pipeline

I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm: Given that the N-Nearest Neighbors algorithm and Euclidian distance are ...

thereandhere1

775

asked Jun 22, 2020 at 15:34

1vote

2answers

808views

Cross validation schema for imbalanced dataset

Based on a previous post, I understand the need to ensure that the validation folds during the CV process have the same imbalanced distribution as the original dataset when training a binary ...

thereandhere1

775

asked Jun 16, 2020 at 16:02

Stack Exchange Network

All Questions

SVC labels entire sample majority class, even after using ADASYN

How do you know that your classifier is suffering from class imbalance?

Reproducible examples where balancing the training data demonstrably improves accuracy

I used SMOTE-ENN to balance my dataset and it improved the performance metrics, but how can I be sure it's not overfitting?

How to calculate accuracy of an imbalanced dataset

Do I need to use AUPRC for reporting classification results on an imbalanced dataset when the model was trained using upsampling and CV

How to effectively evaluate a model with highly imbalanced and limited dataset

Class imbalance: Will transforming multi-label (aka multi-task) to multi-class problem help?

Give more weight to features based on distribution plot

Over-sampling when predicting a contionuous variable

Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets

Handling Imbalanced Datasets in Orange

What does IBA mean in imblearn classification report?

Using SMOTENC in a pipeline

Cross validation schema for imbalanced dataset

Hot Network Questions

All Questions

Related Tags